Machine Translation for Entity Recognition across Languages in Biomedical Documents
نویسندگان
چکیده
We report on our experiments for the CLEF 2013 Entity Recognition Challenge. Our approach is based on a combination of machine translation and NE tagging techniques. The Silver Standard Corpus (SSC) is used to obtain a corresponding annotated corpus in the target language. The plain text of the SSC is translated and a mapping is created between entities in the original and phrases in the translation, to which are associated the same CUIs as in the original. This produces a Bronze Standard Corpus (BSC) in the target language. A dictionary of entities is also created, which associates to each pair (entity text, semantic group) the corresponding CUIs that appeared in the SSC. The BSC is used to train a model for a Named Entity tagger. The model is used for tagging entities in sentences in the target language with the proper semantic group and the entity dictionary is used for associating CUIs to each of them.
منابع مشابه
تشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملMultilingual Semantic Resources and Parallel Corpora in the Biomedical Domain: the CLEF-ER Challenge
Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the biomedical domain: titles from scientific a...
متن کاملGenerating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval
We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllable...
متن کاملJapanese Term Extraction Using Dictionary Hierarchy and Machine Translation System
There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, they focus on a mono-lingual term extraction method. Therefore, it is difficult to extract terms from documents in foreign languages. This paper describes an automatic term extraction method from documents in foreign languages using a machine translation system. In our method, we trans...
متن کاملCross-lingual Wikification Using Multilingual Embeddings
Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wiki...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013